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ABSTRACT *v 

/This study is an attempt to answer the following 
research questiow* can the reliability of a criterion-referenced test 
be accurately detjprmined according to a multiple classification* of 
the student • s performance? Specifically, the study pursues the 
beta-multinomial model, which postulates the probability distribution 
of an examinee ' s degree of mastery on a criterion-referenced test. 
From this model,, a procedure for assessing the reliability of the 
testing instrument was developed. Simulated data based on the 
beta-multirtomiaL distributions did not .depart significantly from 
those generated by the beta-binomial model. However, these results 
should not preclude the utility of beta-multinomial models in this 
context. (Authof/BW) 
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UTILITY OF'PHE BETA-MULTINOMIAL DISTRIBUTION 
IN MULTIPLE CLASSIFICATION 'SCHEME 



Abstract - , 

The present study is an attempt to answer the 'following research 
question: Can the reliability of a criterion-referenced test be accurately 
determined according to a multiple classification of the student's performance? 
Specifically,- this study pursues a sound statistical model, i.e., the beta- 
multinomial model, which postulates the probability distribution of examinee's 

degree of mastery on a criterion-referenced test. From this model, a procedure 

t 

, for assessing the reliability of the testing instrument can then be developed. 
Ideally and finally, several real -1 ife data sets Should have been employed 
ijp order to justi fy empirical ly (or refine) this reliability estimation, 
procedure. Results from this study 'should and would solve some knotty 
psychometric difficulties which are presently hindering the progress of the 
criterion-referenced testing movement. 

Background \ ■ ' ■ ' - 

Within the domain of criterion-referenced testing, various methods 
have existed nVthe literature wl^ich are intended to assess the reliability 
of a test (Subkoviak,- 1 979). Among these procedure* Huynh's single- 
administration approach has received much attention due to the elegance 
of its model and tolerable bias associated with its estimates (Huynh, 1976; 
Subkoviak, 1978). Subsequently, Huynh's procedure was well investigated 
and simplified for classroom teachers or practitoners who might not have 
access to a computer (Peng and Subkoviak, 1980). 

The Beta-Binomial Model 

Two 'major assumptions underlie Huynh's procedure: 
(I) A binomial density function is assumed for the distri butionNof scores 
(x) forAan examinee with true ability f over repeated n-itfm tests. 



Therefore, f n . n - x 

^ is the proportion of items in the item population that an examinee 
"can correctly answer. 



(II) A beta distribution for ^ is assunt 



Under these assumptions, it can be shown 
the probability distribution of x is a beta-bin 
geometric) distribution with the following form 




the population, 



and Lord, 1962) that 
(or negative hyper- 



where n= number of items, on a test and 

B( , )= <i beta function defined by the parameters in the parenthesis, 



A bivariate beta-binomial distribution is determined similarly, 

f(X,y)= ' B(*+**Y 

Reliability Indices derived from the Beta-Binomial Model 

Under the beta-binomial model, a criterion-referenced test is simply 
a mastery test , A mastery tes,t typical ly classifies an examinee into 
one of the two categories: master or nonmaster, according to a predetermined 
criterion or cutoff. Figure 1 below depicts this general decision-making 
framework. 
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-When two parallel forma X and Y exist , the probabil ity of consistent? 
classification of pupils is composed of. two elements : the probability of 
a nonmaster consistently identified by both forms and that of a rjiaster 
again by X and Y. Mathematical ly, this ^probabi 1 ity can be expressed as - 



p « = p ' 

consistent classification • - f 

■. = p + p 

This binary classification is equally imposed on individual items from . 
the perspective of'a beta-binomial model . 4 Al ternatively, -a standardized 



kappa coefficient can be used also' to suffice the purpose of quantifying , 
a reliability . This leads into the following definition 



' P- P* 2 2 

Kappa= ^ chance • " = P- (Pp + P7 ) 

. max ^ -• P chance - ] " < p o + P ? > 



\ 



•Statement of the Problem » 

Unfortunately, the Huynh's approach "ds - well as th.e simplied procedure 

assumes that an examinee either masters or fails a test. In order words, 

these approaches are restricted to mastery tests only. It is , however, 

more realistic. to assume that 'a typical pupil is capable of mastering, 

a portion, if not the entirety, of all the materials taught. Hence, a 

f 

multiple classification scheme on items and tests seems reasonable for \ 
determininga student's level of mastery on a" criterion-referenced test. 
This suggests the development of the beta-multinomial model, which is an* 
expansion of the 'beta-binomial model underlying the Huynh method. 

The Beta^Mul tinomial Model 

Three useful references are given by Cheng (1964, in modern Chinese), 
Ishii and Hayak'awa (I960), and Mos.imann (1962). The original manuscripts 
were published in separate and yet^remote locations around the world; hence, 
they singalled an alarmingmessage for more headaches in days to come as 
long as I remained interested in pursuing this line of research -(Sigh! ) 

Two major assumptions /implied by the beta-multinomial model: 



(1) A multinomial density function is assumed for the conditional distri- 
bution of scores x (=x-j + w*x,>) fon an examinee with' true ability 
"f ^'S*) 0ver re P eatecl N-items test. * 



f (x=x ] + w^l^-^+w^K 



X, Ko . - N-X,-X 9 
■•Ui) U 2 >. 2 Pl-d-ca) 1 - 2 



: ' o < (x=x ] +wx 2 )< n 

where X-|=# of items that ao examinee can. completely master, 

* X^=# of^items that an examinee can partially master,^ 

w =partial credit awarded -to items on which "an examinee demonstrates partial 
mastery- which equals a constant term in the equation, 

^i=the proportion o.f items in the item population that an examinee can ' 

correctly answer, and 
i 2 =the proportion- of items in the item population thai an examinee can 

partially answer. 

(2) ^\ multivariate beta distribution for ti & c 2 is^ assumed across the 
population of examinees. — ^ 
Under these assumptions ,« it can be shown (Mosimann, 1.962) that 
\the probability distribution of.X is a compound beta-multinomial 

: f(X j.. h! - J/^L ■ ff) a,) - j£ ■ , 

* X,! X ' (N-X -X )! B(aj Jo.) B(a 2 ,a 3 ) 

■ ' c i=2 1 

1 ' - ■ 

where B( , )= a beta function defined by the' parameters in the 
* ■ parentheses. 



Estimation procedures of this complex parametric model are provided 
in Cheng (1964). However,. Cheng's procedures are far too sophisticated 
to be implemented by practitioners in edition.. Simplified procedures 
(such as the method of moments) ouijht to be developed, and also the 
applications of the beta-multinomial model in, the' 1 iterature deserves 
an in-depth review. 

When the beta-multinomial model is generalized to a joint distribution 
of scores x and y onparallel tests, a<bivariate beta-mul tin'omial distri- 
bution should result (by mathematical derivation). This bivariate 
distribution, denoted by f(x,y), should have the sapJe set of parameters 
as f(x), since x and y are obtained from. parallel tests and identical 
criteria should be enforced in both cases. Hence, estimated parameters 
developed in any estimation procedure should be sufficient in determining 
the bivariate distribution of scores, f(x,y), which would result if . 
two tests were indeed administered- This .rationale constitutes a sound 
basis for developing a single-administrat-^bn approach in assessing 
the reliability of a criterion-referenced test. 



Proposed Procedure for Assessing Reliability based oh Beta-Multinomial Model 
f 



Two phases: 



Slmu 
JReyf 



lated Data > and 
Data (very difficult to locate) 



Simulated Data . Four steps are necessary : 

Step 1-- Various values of alphas are considered according to the 

specification in Table 1 (page 6). 
Step 2-- Specifications on 'test length (N) and cutoff scores (C-j and C£) 

are included in Table 2 (page 7). 
Step 3-- Generate the f(x) and f(x,y) distributions^ based on Steps 1 and 2. 
Step 4— Develop a single-administration approach to compute P or kappa. 

Tentatively, P=P Q0 + P„ + ? 22 and 

Kappa = P - PchAnOL. 

1 P 
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Selected Beta- Distributions for Study 



Case 
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general description 


I, 
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A 
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Uniform 


II 


.5 


.5 




U-shaped 


III 
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Symmetric, unimodal 
& platykurtic 


IV 
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Symmetric, unimodal 
& leptokurtic 
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Negatively skewed 
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SELECTED VALUES OF N, C ] and C 2 
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*V 45% 


55/o ' 


65% 
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95% 
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3 

(2.25) 
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(2.75) 


4 

' (3.25|). 


4 

(3.75) 


5 • 
(4.25). . 


' 5 

(4.75) ' . 


10 
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(4.50) 


6 

(5.50) 
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(6:50) 


8 , 
(7.50) 


9 

(8.50) 


10 
(9.50) 


15 


7 

(6.75) 


9 

(8.25) 


10 
(9.75) 


12 
(11.25) 


•13 
(12.75) 


15 
(14.25) 


20 

» 

30 


9 

(9.0) 
14 

- (13.5) 


- 11 

(ii.o) 

17- 
.06.5) 


13 

(13.0) 

20 
.09.5)- 


15 

(15.0) 
23 

. (2215) 


17 

(17.0) 
26 

(25.5) 


19 

(19.0) 

. 29 
(28.5) 
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Real Data Analysis . Also four steps are to be executed: 

Step 1-- Estimate ^ , °*z } anA.^ via the method of moments. This needs 
an in-depth review of the literature). 

Step 2-- Geherate the f(xj and f(x,y) distribution based on Step 1-abbve. 
.Step 3-- Compute P and Kappa according to the single-administration proce- 
dure developed in Step 4 under the simulated study. 

Step 4— Compare P against true P obtained from the test-retest results; 



Also, perform the same contrast between "kappa and true kappa to 
determine whether the be£a-mul tinomial model along with >the singl- 
test administration procedure yield, satisfactory results! 



r 
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Preliminary Results Obtained from Simulat ed Data 

: P * 

In simulating artificial data from the b^ta-multipoijrial model ; the 
actual criterion scores (Table 2) were never utilized.. Instead, the 
proportion of mastered items and that of partially master items were 
sufficient; Figures 2-5 (Pp. 11-14) depict the probability distributions 
of trichotomou.s da,ta simulated from f(x) on page 4. Here, 10% refers'to 
the percent of mastered items whereas 70% the partially mastered , items . 
Then on page 15, Figure 6. combines various beta functions with 5 distinct 
test lengths. The overlay effect shows clearly that the shape of the 1 
compound beta-multinomial distribution ijS detfermined solely by parameters 
of the bet^ functions. The weight coefficient (w)^ as one might imagine, 
would not affect the probabilistic functions shown on pages 11-15. 

Whe/i the percentages varied from a (10%, 70%) combination to'a 
(30%, 30%) combination, the appearance of beta-multinomial distributions 
altered accordingly; although the general shape remained unchanged. 

So, where is the beef? Sadly enough, the simulated data based on 
the^beta-mul tinomial distributions did not depart significantly from 
those generated by the beta-binomial- model (see Figures 9-13 for the 
univariate cases and Figure 14 for one bivariate case). Perhaps this 
was the main reason' why Huynh -preferred the r beta-binomial model even for 

cases involving multiple classifications (e.g., Huynh, 1978, 
Psychometrika). His preference certainly should not preclude the utility 
of beta-multinomial models in the present context. Conceptually, the be±a 
multinomial model is well matched with the framework of a multiple, 
classification, more so probably than the simple beta-binomial model. •• 
Before committing a fatal error iti her conceptualization of the problem, 
the author* welcomes insights or comments on her proposed rT/ethodology . 
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